基于支持向量机的HCV NS5B聚合酶抑制剂的分类研究

Classification of HCV NS5B Polymerase Inhibitors Using Support Vector Machine

Wang, M.L.; Wang, K.; Yan, A.X.*
International Journal of Molecular Sciences, 2012,13(4), 4033-4047.

    基于386个结合在丙型肝炎病毒(HCV) NS5B聚合酶NNI III变构位点的非核苷类似物抑制剂(NNIs),使用支持向量机(SVM)建立了三种分类模型来预测化合物的高、低活性。对于每个化合物,应用ADRIANA.Code程序分别计算了全局描述符、二维自相关描述符和三维自相关描述符。结合不同类型的描述符,建立了三个模型。发现基于16个全局和二维自相关描述符的模型2对测试集的预测正确率最高,为88.24%,MCC (马修斯相关系数) 为0.789。此后,新收集了80个化合物作为外部测试集,发现基于13个全局描述符的模型1对外部测试集的预测正确率最高,为86.25%,MCC为0.732。一些分子描述符在NS5B聚合酶与配体的相互作用中起重要作用,如分子形状描述符(IntertiaZ、InertiaX和Span)、可旋转键数(NRotBond)、分子水溶性(LogS)和氢键相关描述符。

阅读文章原文

下载原始数据

Download Supporting Information

    Using a support vector machine (SVM), three classification models were built to predict whether a compound is an active or weakly active inhibitor based on a dataset of 386 hepatitis C virus (HCV) NS5B polymerase NNIs (non-nucleoside analogue inhibitors) fitting into the pocket of the NNI III binding site. For each molecule, global descriptors, 2D and 3D property autocorrelation descriptors were calculated from the program ADRIANA.Code. Three models were developed with the combination of different types of descriptors. Model 2 based on 16 global and 2D autocorrelation descriptors gave the highest prediction accuracy of 88.24% and MCC (Matthews correlation coefficient) of 0.789 on test set. Model 1 based on 13 global descriptors showed the highest prediction accuracy of 86.25% and MCC of 0.732 on external test set (including 80 compounds). Some molecular properties such as molecular shape descriptors (InertiaZ, InertiaX and Span), number of rotatable bonds (NRotBond), water solubility (LogS), and hydrogen bonding related descriptors performed important roles in the interactions between the ligand and NS5B polymerase.

Read More

Classification Models performance:   Dataset (386 HCV NS5B Polymerase inhibitors)

Model Name Algorithm Descriptors Training set accuracy (%) Test set SE Test set SP Test set accuracy (%) Test set MCC
Model 1 SVM 13 CORINA global 87.97 97.92 61.11 78.43 0.625
Model 2 SVM 5 CORINA global, 11 CORINA 2D 95.49 100.00 77.78 88.24 0.789
Model 3 SVM 9 CORINA global, 10 CORINA 3D 95.11 100.00 64.81 81.37 0.681